Problem Note 54476: DBCS characters from SAS® Content Categorization code are incorrectly processed when used by the Text Rule Builder node
DBCS characters from SAS Content Categorization code are incorrectly processed when used by the Text Rule Builder node in SAS® Text Miner. The node tries to read the characters using UTF-8 encoding. However, the underlying tgcode.txt file is built and saved using the session encoding, which might be different from UTF-8. This issue occurs for languages that contain non-latin1 characters.
If the data set and corresponding code are saved using the UTF-8 encoding session, then the Text Rule Builder node uses the code correctly.
Operating System and Release Information
SAS System | SAS Text Miner | Microsoft® Windows® for x64 | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
Microsoft Windows Server 2008 R2 | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
Microsoft Windows Server 2008 for x64 | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
Windows 7 Enterprise x64 | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
Windows 7 Professional x64 | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
64-bit Enabled AIX | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
64-bit Enabled Solaris | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
HP-UX IPF | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
Linux for x64 | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
Solaris for x64 | 5.1_M1 | 12.3 | 9.3 TS1M1 | 9.4 TS1M0 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
DBCS characters from SAS® Content Categorization code are incorrectly processed when used by the Text Rule Builder node
Type: | Problem Note |
Priority: | high |
Topic: | Analytics ==> Data Mining Analytics ==> Text Mining
|
Date Modified: | 2015-04-06 07:58:24 |
Date Created: | 2014-10-29 10:32:09 |